{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Highlights of the Code\n",
    "\n",
    "1. **Replication of Table E**  \n",
    "   This code replicates Table E from the Internet Appendix of the paper.\n",
    "\n",
    "<br>\n",
    "\n",
    "2. **Forecasting Risk Premiums with NN-3**  \n",
    "   The code demonstrates how to forecast risk premiums using a 3-layer Neural Network with Dropout (NN-3)  \n",
    "   and generates the corresponding forecast confidence intervals.\n",
    "\n",
    "<br>\n",
    "\n",
    "3. **175 Stock-Level Predictors as Inputs**  \n",
    "   It utilizes 175 stock-level predictors to produce NN-3-based risk premium forecasts  \n",
    "   along with their forecast standard errors.\n",
    "\n",
    "<br>\n",
    "\n",
    "4. **Model Retraining Schedule**  \n",
    "   - Models are retrained at the start of each year.  \n",
    "   - With out-of-sample data spanning 1987 to 2020, the models are retrained 34 times.  \n",
    "   - **Initial training sample**: 1957 to 1974.  \n",
    "   - The training sample expands recursively each year.  \n",
    "   - **Initial validation sample**: 1975 to 1986.  \n",
    "   - The validation sample advances forward annually.\n",
    "\n",
    "<br>\n",
    "\n",
    "5. **Factor Loadings Without Sampling Error**  \n",
    "   In its current form, the code assumes that factor loadings are estimated without sampling error.\n",
    "\n",
    "<br>\n",
    "\n",
    "6. **Alternative Specification with Estimation Uncertainty**  \n",
    "   The code offers an alternative specification to account for the estimation uncertainty of the factor loadings.  \n",
    "   To implement this, comment out the current factor loading specification and uncomment the alternative one.\n",
    "\n",
    "<br>\n",
    "\n",
    "7. **Copyright Protection**  \n",
    "   The code is intended solely for research and non-commercial purposes.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import numpy as np # linear algebra\n",
    "import multiprocessing as mp\n",
    "import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)\n",
    "import seaborn as sns\n",
    "import numpy\n",
    "import pandas\n",
    "from joblib import Parallel, delayed\n",
    "import keras  \n",
    "from keras import regularizers\n",
    "from keras.models import Sequential\n",
    "from keras.layers import Dense\n",
    "from keras.wrappers.scikit_learn import KerasRegressor\n",
    "from sklearn.model_selection import cross_val_score\n",
    "from sklearn.model_selection import KFold\n",
    "from sklearn.preprocessing import StandardScaler\n",
    "from sklearn.pipeline import Pipeline\n",
    "from keras.layers import BatchNormalization\n",
    "from keras.layers import Activation\n",
    "from keras import optimizers\n",
    "from sklearn.metrics import r2_score\n",
    "from keras import callbacks\n",
    "from keras.callbacks import EarlyStopping, ModelCheckpoint\n",
    "from keras.layers import Dropout\n",
    "from keras.models import load_model\n",
    "from sklearn.metrics import mean_squared_error\n",
    "from sklearn.linear_model import ElasticNet\n",
    "from numpy.linalg import inv\n",
    "from scipy import stats\n",
    "import scipy\n",
    "from scipy import stats\n",
    "from scipy.optimize import minimize\n",
    "from scipy.stats import invgauss\n",
    "from sklearn import linear_model\n",
    "#from scipy.stats import invgauss\n",
    "import scipy.stats as sc\n",
    "from numpy.linalg import matrix_rank\n",
    "from sklearn.linear_model import LinearRegression\n",
    "from sklearn.preprocessing import MinMaxScaler\n",
    "#import gc"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import numpy as np # linear algebra\n",
    "import multiprocessing as mp\n",
    "import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)\n",
    "import seaborn as sns\n",
    "import numpy\n",
    "import pandas\n",
    "from joblib import Parallel, delayed\n",
    "#from keras.initializers import HeNormal\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "class MCDropout(Dropout):\n",
    "    def call(self, inputs, training=None):\n",
    "        # Activate dropout during prediction by setting `training=True`\n",
    "        if training is None:\n",
    "            training = tf.constant(True)  # Force dropout during inference (i.e., while making predictions)\n",
    "        return super().call(inputs, training=training)\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import keras  \n",
    "from keras import regularizers\n",
    "from keras.models import Sequential\n",
    "from keras.layers import Dense\n",
    "from keras.wrappers.scikit_learn import KerasRegressor\n",
    "from sklearn.model_selection import cross_val_score\n",
    "from sklearn.model_selection import KFold\n",
    "from sklearn.preprocessing import StandardScaler\n",
    "from sklearn.pipeline import Pipeline\n",
    "from keras.layers import BatchNormalization\n",
    "from keras.layers import Activation\n",
    "from keras import optimizers\n",
    "from sklearn.metrics import r2_score\n",
    "from keras import callbacks\n",
    "from keras.callbacks import EarlyStopping, ModelCheckpoint\n",
    "from keras.layers import Dropout\n",
    "from keras.models import load_model\n",
    "from sklearn.metrics import mean_squared_error"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import random\n",
    "import tensorflow as tf\n",
    "seed=0\n",
    "\n",
    "random.seed(seed)\n",
    "np.random.seed(seed)\n",
    "tf.random.set_seed(seed)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def identify_near_constant_predictors(df, threshold=0.01):\n",
    "    # Calculate the ratio of unique values to the total number of values\n",
    "    unique_ratio = df.apply(lambda x: len(x.unique()) / len(x))\n",
    "    \n",
    "    # Identify columns where the ratio of unique values is below the threshold\n",
    "    near_constant_columns = unique_ratio[unique_ratio < threshold].index.tolist()\n",
    "    \n",
    "    return near_constant_columns\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "###This function models data using a 3-layer NN with dropout & l2 regualrizations. But this code does not\n",
    "## drop neurons while making predictions \n",
    "def pred_modeld(nump,lrpar,l1par):\n",
    "    \n",
    "    seed=42\n",
    "    random.seed(seed)\n",
    "    np.random.seed(seed)\n",
    "    tf.random.set_seed(seed)\n",
    "    \n",
    "    model = Sequential()\n",
    "    model.add(Dropout(0.15, input_shape=(nump,)))\n",
    "\n",
    "    model.add(Dense(32, input_dim=nump, kernel_regularizer=regularizers.l2(l1par)))\n",
    "    model.add(Dropout(0.15, input_shape=(nump,)))\n",
    "    model.add(Activation('relu'))\n",
    "    \n",
    "    model.add(Dense(16, input_dim=nump, kernel_regularizer=regularizers.l2(l1par)))#,\n",
    "    model.add(Dropout(0.15, input_shape=(nump,)))\n",
    "    model.add(Activation('relu'))\n",
    "    \n",
    "    \n",
    "    model.add(Dense(8, input_dim=nump, kernel_regularizer=regularizers.l2(l1par)))#,\n",
    "    model.add(Dropout(0.15, input_shape=(nump,)))\n",
    "    model.add(Activation('relu'))\n",
    "    \n",
    "    model.add(Dense(1, kernel_initializer='normal'))\n",
    "    adam=optimizers.Adam(lr=lrpar)\n",
    "    model.compile(loss='mean_squared_error', optimizer=adam)\n",
    "    \n",
    "    \n",
    "    return model"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def pred_model(nump,xtrain,ytrain,lrpar,l1par,xtest,ytest,num_ep,batchsize,patience):\n",
    "    seed=42\n",
    "    random.seed(seed)\n",
    "    np.random.seed(seed)\n",
    "    tf.random.set_seed(seed)\n",
    "    model = Sequential()\n",
    "    model.add(MCDropout(0.15, input_shape=(nump,)))\n",
    "    \n",
    "    model.add(Dense(32, input_dim=nump, kernel_regularizer=regularizers.l2(l1par)))#,\n",
    "               \n",
    "    \n",
    "    model.add(MCDropout(0.15, input_shape=(nump,)))\n",
    "    \n",
    "    model.add(Activation('relu'))\n",
    "    \n",
    "    \n",
    "    model.add(Dense(16, input_dim=nump, kernel_regularizer=regularizers.l2(l1par)))#,\n",
    "               \n",
    "    \n",
    "    model.add(MCDropout(0.15, input_shape=(nump,)))\n",
    "    \n",
    "    model.add(Activation('relu'))\n",
    "    \n",
    "    \n",
    "    model.add(Dense(8, input_dim=nump, kernel_regularizer=regularizers.l2(l1par)))#,\n",
    "                #activity_regularizer=regularizers.l1(l1par[i])))\n",
    "    \n",
    "    model.add(MCDropout(0.15, input_shape=(nump,)))\n",
    "    \n",
    "    model.add(Activation('relu'))\n",
    "    \n",
    "    model.add(Dense(1, kernel_initializer='normal'))\n",
    "    adam=optimizers.Adam(lr=lrpar)\n",
    "    model.compile(loss='mean_squared_error', optimizer=adam)\n",
    "    callbacks = [EarlyStopping(monitor='val_loss',patience=patience),\n",
    "    ModelCheckpoint(filepath='best_model.h5', monitor='val_loss', save_best_only=True)]\n",
    "    model.fit(xtrain,ytrain,epochs=num_ep,callbacks=callbacks,batch_size=batchsize,validation_data=(xtest, ytest),verbose=0) \n",
    "    return model"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def rsq_oos(y,ypred): \n",
    "    SSR = np.sum((y-ypred)**2)\n",
    "    MSS = np.sum((y)**2)\n",
    "    r_squared =1-SSR/MSS \n",
    "    return r_squared"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from sklearn import linear_model"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from sklearn.preprocessing import MinMaxScaler\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def signals_and_confidence_levels():\n",
    "    seed=42\n",
    "    #random.seed(seed)\n",
    "    np.random.seed(seed)\n",
    "    tf.random.set_seed(seed)\n",
    "    \n",
    "    \n",
    "    \n",
    "    for M in range(1,31):\n",
    "        \n",
    "        \n",
    "        #M=dat[N]\n",
    "        gg1=pd.read_csv('twdata{}.csv'.format(M))\n",
    "        gg2=pd.read_csv('vdata{}.csv'.format(M))\n",
    "        gg3=pd.read_csv('tedata{}.csv'.format(M))\n",
    "\n",
    "        gg1['retmkt']=gg1['retmkt']-gg1['retmkt'].mean()\n",
    "        gg1['mktsize']=gg1['retmkt']*gg1['mvel1']\n",
    "        gg1['mktbm']=gg1['retmkt']*gg1['bm']\n",
    "        gg1['mktmom']=gg1['retmkt']*gg1['mom12m']\n",
    "        \n",
    "        pp=(gg1.columns.values)\n",
    "        gg1[\"yyyymm\"]= gg1[\"yyyymm\"].astype(str) \n",
    "        gg2[\"yyyymm\"]= gg2[\"yyyymm\"].astype(str) \n",
    "        gg3[\"yyyymm\"]= gg3[\"yyyymm\"].astype(str) \n",
    "        for k in range(97,105):\n",
    "            gg1[pp[k]]=(gg1[pp[k]]-np.mean(gg1[pp[k]]))/np.std(gg1[pp[k]])\n",
    "            gg2[pp[k]]=(gg2[pp[k]]-np.mean(gg1[pp[k]]))/np.std(gg1[pp[k]])\n",
    "            gg3[pp[k]]=(gg3[pp[k]]-np.mean(gg1[pp[k]]))/np.std(gg1[pp[k]]) \n",
    "        \n",
    "        \n",
    "        gg1=gg1.drop(columns=['count'])\n",
    "        gg2=gg2.drop(columns=['count'])\n",
    "        gg3=gg3.drop(columns=['count'])\n",
    "        gg1.columns[(gg1 == 0).all()]\n",
    "\n",
    "     \n",
    "        \n",
    "        drop_col_names=identify_near_constant_predictors(gg1.iloc[:,3:97])\n",
    "        gg1=gg1.drop(columns=drop_col_names)\n",
    "        gg2=gg2.drop(columns=drop_col_names)\n",
    "        gg3=gg3.drop(columns=drop_col_names)\n",
    "        \n",
    "        ##Dropping predictors that are nearly contant \n",
    "        nunique = gg1.nunique()\n",
    "        cols_to_drop = nunique[nunique == 1].index\n",
    "        gg1=gg1.drop(cols_to_drop, axis=1)\n",
    "        gg2=gg2.drop(cols_to_drop, axis=1)\n",
    "        gg3=gg3.drop(cols_to_drop, axis=1)\n",
    "        \n",
    "        \n",
    "        \n",
    "        \n",
    "        pp=(gg2.columns.values)\n",
    "        num_cols=pp[3:].size+3\n",
    "\n",
    "        \n",
    "        xtrain=((gg1.iloc[:,3:num_cols]))\n",
    "        xtest=((gg2.iloc[:,3:num_cols]))\n",
    "        xoos=(gg3.iloc[:,3:num_cols])\n",
    "        xerror=gg1[['retmkt','mktsize','mktbm','mktmom']]\n",
    "        ytrain=(gg1.iloc[:,2])\n",
    "\n",
    "        ytest=gg2.iloc[:,2]\n",
    "        yoos=gg3.iloc[:,2]\n",
    "\n",
    "        \n",
    "        ##Standardizing data\n",
    "        xtraind=(xtrain-np.mean(xtrain))/np.std(xtrain)\n",
    "        xtestd=(xtest-np.mean(xtrain))/np.std(xtrain)\n",
    "        xoosd=(xoos-np.mean(xtrain))/np.std(xtrain)\n",
    "           \n",
    "        l1par=0.0001\n",
    "        l1par2=0.0001\n",
    "        #l1par=0.000035\n",
    "        #l1par2=0.000035\n",
    "        lrpar=0.001\n",
    "        lrpar2=0.001\n",
    "        patience=10\n",
    "        batchsize=10000\n",
    "        modeld=pred_model(num_cols-3,xtraind,ytrain,lrpar,l1par,xtestd,ytest,100,batchsize,patience)\n",
    "        \n",
    "       \n",
    "        mytestmodel=pred_modeld(num_cols-3,lrpar,l1par)\n",
    "        mytestmodel.set_weights(modeld.get_weights())\n",
    "        ytrainpred=mytestmodel.predict(xtraind).ravel()\n",
    "        errortrain=ytrain-(ytrainpred)\n",
    "        \n",
    "        reg = LinearRegression().fit(xerror, errortrain)\n",
    "        errorfit=reg.predict(xerror)\n",
    "        xyerror=xerror.transpose().dot(errortrain)\n",
    "        errorvar=np.var(errortrain-errorfit)\n",
    "        \n",
    "        Aerror= xerror.transpose().dot(xerror)\n",
    "        Aerrorinv=np.linalg.inv(Aerror)\n",
    "        errorbetain=Aerrorinv.dot(xyerror)\n",
    "        \n",
    "        \n",
    "        confid=[]\n",
    "        predmean=[]\n",
    "        errorpar=[1]\n",
    "        for j in range (4): ##alternatively, use ``j in range(5)\" instead\" if the factor loading uncertainty\n",
    "                            ## needs to be accounted for\n",
    "            \n",
    "            #errorbetains=errorbetain*errorpar[j]\n",
    "            \n",
    "            errorbetains=np.random.multivariate_normal(errorbetain,0.00001*Aerrorinv)\n",
    "\n",
    "            ## use this expression instead for errorbetains size(j)>1, this expression accounts \n",
    "            ##for the factor loaidng uncertainty \n",
    "            \n",
    "            \n",
    "            errorcor=xerror.dot(errorbetains)\n",
    "            #errortcor=xterror.dot(errorbetains)\n",
    "            ytrainerror=ytrain-errorcor\n",
    "            modelnd=pred_model(num_cols-3,xtraind,ytrainerror,lrpar2,l1par2,xtestd,ytest,100,batchsize,patience)\n",
    "            yoos_pred_st=np.stack([modelnd.predict(xoosd) for sample in range(100)]) \n",
    "            ## Code for obtaining CIs\n",
    "            confid.append(yoos_pred_st.std(axis=0))\n",
    "            predmean.append(yoos_pred_st.mean(axis=0))\n",
    "\n",
    "        confide=np.square(confid)\n",
    "        confiden=np.sqrt(np.var(predmean,axis=0)+np.mean(confide,axis=0))\n",
    "        yoosbayes=np.mean(predmean,axis=0)\n",
    "        yoosbayesconf=pd.DataFrame(confiden, columns = ['yoosbayesconf'])\n",
    "        yoosbayespred=pd.DataFrame(yoosbayes, columns = ['yoosbayespred'])\n",
    "        yoosfreqpred=pd.DataFrame(yoosbayes, columns = ['yoosfreqpred'])\n",
    "        hh=yoosbayespred\n",
    "        date_and_permno = gg3.iloc[0:, [0,1]]\n",
    "        yoosc=gg3.iloc[0:,2:3]\n",
    "\n",
    "\n",
    "        mergeout1 = np.concatenate((date_and_permno,yoosfreqpred,yoosbayespred,yoosbayesconf,yoosc),axis=1)\n",
    "        mergeout1 = pd.DataFrame(mergeout1)\n",
    "        mergeout1.to_csv('D:/bayesian jmp/nn_output/neural3rawork{}.csv'.format(M), index=False, header=False)\n",
    "\n",
    "\n",
    "\n",
    "\n",
    "\n",
    "\n",
    "       \n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "signals_and_confidence_levels()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## **More Recent Sample**\n",
    "\n",
    "In the most recent sample uploaded by GKX, covering the years 2017–2020:\n",
    "\n",
    "1. The **price delay variable** is not available.  \n",
    "2. The **order of the first few columns** differs from the earlier sample.  \n",
    "\n",
    "So, adjust the earlier code accordingly to **account for these differences**.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def signals_and_confidence_levels():\n",
    "    seed=100\n",
    "    #random.seed(seed)\n",
    "    np.random.seed(seed)\n",
    "    tf.random.set_seed(seed)\n",
    "    \n",
    "    \n",
    "    \n",
    "    for M in range(31,35):\n",
    "        \n",
    "        \n",
    "        #M=dat[N]\n",
    "        gg1=pd.read_csv('twdata{}.csv'.format(M))\n",
    "        gg2=pd.read_csv('vdata{}.csv'.format(M))\n",
    "        gg3=pd.read_csv('tedata{}.csv'.format(M))\n",
    "\n",
    "        \n",
    "        gg3['dp']=np.log(gg3['dp'])\n",
    "        gg2['dp']=np.log(gg2['dp'])\n",
    "        gg1['dp']=np.log(gg1['dp'])\n",
    "        \n",
    "        \n",
    "        gg3['earntoprice']=np.log(gg3['earntoprice'])\n",
    "        gg2['earntoprice']=np.log(gg2['earntoprice'])\n",
    "        gg1['earntoprice']=np.log(gg1['earntoprice'])\n",
    "        \n",
    "        #gg1['mvel1']=np.log(gg1['mvel1']+0.0001)\n",
    "        #gg2['mvel1']=np.log(gg2['mvel1']+0.0001)\n",
    "        #gg3['mvel1']=np.log(gg3['mvel1']+0.0001)\n",
    "        \n",
    "        \n",
    "        gg1['retmkt']=gg1['retmkt']-gg1['retmkt'].mean()\n",
    "        gg1['mktsize']=gg1['retmkt']*gg1['mvel1']\n",
    "        gg1['mktbm']=gg1['retmkt']*gg1['bm']\n",
    "        gg1['mktmom']=gg1['retmkt']*gg1['mom12m']\n",
    "        \n",
    "\n",
    "        \n",
    "        \n",
    "        pp=(gg1.columns.values)\n",
    "        gg1[\"yyyymm\"]= gg1[\"yyyymm\"].astype(str) \n",
    "        gg2[\"yyyymm\"]= gg2[\"yyyymm\"].astype(str) \n",
    "        gg3[\"yyyymm\"]= gg3[\"yyyymm\"].astype(str) \n",
    "        for k in range(97,105):\n",
    "            gg1[pp[k]]=(gg1[pp[k]]-np.mean(gg1[pp[k]]))/np.std(gg1[pp[k]])\n",
    "            gg2[pp[k]]=(gg2[pp[k]]-np.mean(gg1[pp[k]]))/np.std(gg1[pp[k]])\n",
    "            gg3[pp[k]]=(gg3[pp[k]]-np.mean(gg1[pp[k]]))/np.std(gg1[pp[k]]) \n",
    "        \n",
    "        \n",
    "        gg1=gg1.drop(columns=['count'])\n",
    "        gg2=gg2.drop(columns=['count'])\n",
    "        gg3=gg3.drop(columns=['count'])\n",
    "        gg1.columns[(gg1 == 0).all()]\n",
    "\n",
    "     \n",
    "        \n",
    "        drop_col_names=identify_near_constant_predictors(gg1.iloc[:,3:96])\n",
    "        gg1=gg1.drop(columns=drop_col_names)\n",
    "        gg2=gg2.drop(columns=drop_col_names)\n",
    "        gg3=gg3.drop(columns=drop_col_names)\n",
    "        \n",
    "        ##Dropping predictors that are nearly contant \n",
    "        nunique = gg1.nunique()\n",
    "        #cols_to_drop = nunique[nunique == 1].index\n",
    "        #gg1=gg1.drop(cols_to_drop, axis=1)\n",
    "        #gg2=gg2.drop(cols_to_drop, axis=1)\n",
    "        #gg3=gg3.drop(cols_to_drop, axis=1)\n",
    "        \n",
    "        \n",
    "        \n",
    "        \n",
    "        pp=(gg2.columns.values)\n",
    "        num_cols=pp[3:].size+3\n",
    "\n",
    "        \n",
    "        xtrain=((gg1.iloc[:,3:num_cols]))\n",
    "        xtest=((gg2.iloc[:,3:num_cols]))\n",
    "        xoos=(gg3.iloc[:,3:num_cols])\n",
    "        xerror=gg1[['retmkt','mktsize','mktbm','mktmom']]\n",
    "        ytrain=(gg1.iloc[:,1])\n",
    "\n",
    "        ytest=gg2.iloc[:,1]\n",
    "        yoos=gg3.iloc[:,1]\n",
    "\n",
    "        \n",
    "        ##Standardizing data\n",
    "        #ytraind=(ytrain-np.mean(ytrain,axis=0))/np.std(ytrain)\n",
    "        #ytestd=(ytest-np.mean(ytrain,axis=0))/np.std(ytrain)\n",
    "        xtraind=(xtrain-np.mean(xtrain))/np.std(xtrain)\n",
    "        xtestd=(xtest-np.mean(xtrain))/np.std(xtrain)\n",
    "        xoosd=(xoos-np.mean(xtrain))/np.std(xtrain)\n",
    "           \n",
    "        l1par=0.0001\n",
    "        l1par2=0.0001\n",
    "        lrpar=0.001\n",
    "        lrpar2=0.001\n",
    "        patience=10\n",
    "        batchsize=10000\n",
    "        modeld=pred_model(num_cols-3,xtraind,ytrain,lrpar,l1par,xtestd,ytest,100,batchsize,patience)\n",
    "        \n",
    "       \n",
    "        mytestmodel=pred_modeld(num_cols-3,lrpar,l1par)\n",
    "        mytestmodel.set_weights(modeld.get_weights())\n",
    "        ytrainpred=mytestmodel.predict(xtraind).ravel()\n",
    "        errortrain=ytrain-(ytrainpred)\n",
    "        \n",
    "        reg = LinearRegression().fit(xerror, errortrain)\n",
    "        errorfit=reg.predict(xerror)\n",
    "        xyerror=xerror.transpose().dot(errortrain)\n",
    "        errorvar=np.var(errortrain-errorfit)\n",
    "        \n",
    "        Aerror= xerror.transpose().dot(xerror)\n",
    "        Aerrorinv=np.linalg.inv(Aerror)\n",
    "        errorbetain=Aerrorinv.dot(xyerror)\n",
    "        \n",
    "        \n",
    "        confid=[]\n",
    "        predmean=[]\n",
    "        errorpar=[1]\n",
    "        for j in range (4): ##alternatively, use ``j in range(5)\" instead\" if the factor loading uncertainty\n",
    "                            ## needs to be accounted for\n",
    "            \n",
    "            #errorbetains=errorbetain*errorpar[j] (0.0000001 works)\n",
    "            \n",
    "            errorbetains=np.random.multivariate_normal(errorbetain,0.00001*Aerrorinv)\n",
    "\n",
    "            ## use this expression instead for errorbetains size(j)>1, this expression accounts \n",
    "            ##for the factor loaidng uncertainty \n",
    "            \n",
    "            \n",
    "            errorcor=xerror.dot(errorbetains)\n",
    "            #errortcor=xterror.dot(errorbetains)\n",
    "            ytrainerror=ytrain-errorcor\n",
    "            modelnd=pred_model(num_cols-3,xtraind,ytrainerror,lrpar2,l1par2,xtestd,ytest,100,batchsize,patience)\n",
    "            yoos_pred_st=np.stack([modelnd.predict(xoosd) for sample in range(100)]) \n",
    "            ## Code for obtaining CIs\n",
    "            confid.append(yoos_pred_st.std(axis=0))\n",
    "            predmean.append(yoos_pred_st.mean(axis=0))\n",
    "\n",
    "        confide=np.square(confid)\n",
    "        confiden=np.sqrt(np.var(predmean,axis=0)+np.mean(confide,axis=0))\n",
    "        yoosbayes=np.mean(predmean,axis=0)\n",
    "        yoosbayesconf=pd.DataFrame(confiden, columns = ['yoosbayesconf'])\n",
    "        yoosbayespred=pd.DataFrame(yoosbayes, columns = ['yoosbayespred'])\n",
    "        yoosfreqpred=pd.DataFrame(yoosbayes, columns = ['yoosfreqpred'])\n",
    "        hh=yoosbayespred\n",
    "        date_and_permno = gg3.iloc[0:, [0,2]]\n",
    "        yoosc=gg3.iloc[0:,1:2]\n",
    "\n",
    "\n",
    "        mergeout1 = np.concatenate((date_and_permno,yoosfreqpred,yoosbayespred,yoosbayesconf,yoosc),axis=1)\n",
    "        mergeout1 = pd.DataFrame(mergeout1)\n",
    "        mergeout1.to_csv('D:/bayesian jmp/nn_output/neural3rawork{}.csv'.format(M), index=False, header=False)\n",
    "\n",
    "\n",
    "\n",
    "\n",
    "\n",
    "\n",
    "       \n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "signals_and_confidence_levels()"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.11"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
